Home

  Research

  Publications

  Software

  People

  Positions available

  Contact us

 

Protein Binding Microarrays

Yeast Protein Binding Microarrays

Interactions between transcription factors (TFs) and their DNA binding sites are an integral part of regulatory networks within cells. These interactions control critical steps in progression through normal cellular processes and in responses to various environmental stresses. However, the DNA binding site specificities and regulatory functions of many known and most predicted DNA binding proteins, even in the yeast S. cerevisiae, are unknown. The Bulyk Lab has recently developed an improved in vitro protein binding microarray (PBM) technology to characterize TFs' sequence specificities in a high-throughput manner. The PBM technology allows us to determine the DNA binding site specificity of proteins in a single day. Comparison of binding site specificities determined from the PBM approach versus by in vivo genome-wide location analysis (ChIP-chip) indicates that the binding specificities can correspond well with each other.

We are in the midst of a large-scale project to characterize the DNA binding site specificities of all known and most predicted S. cerevisiae TFs whose binding specificities are as yet not well characterized. To achieve this goal, the Bulyk Lab has teamed up with the Harvard Institute of Proteomics, whose expertise with high-throughput cloning, expression, and purification are helping us to examine essentially all known and predicted yeast TFs with PBM experiments. We will analyze these TFs with PBM experiments using whole-genome yeast intergenic microarrays. By computational analyses that integrate our PBM data with other large-scale genomic and proteomic datasets, we can predict a TF's functional roles. In addition, comparison of the PBM data with ChIP-chip data may provide insights on the usage of individual TF binding sites in vivo.


Universal Protein Binding Microarrays

In order to analyze transcription factors from other species, we created a compact, universal PBM that contains all possible sequence variants of a given length k (i.e., all k-mers). Our design is based on de Bruijn sequences, in which all k-mers can be represented in an overlapping manner in an unbiased fashion. To date we have constructed such ‘all k-mer’ PBMs covering all 10 bp binding sites using high-density Agilent microarrays, enabling us to fit all 1,048,576 10-mers in approximately 44,000 spots. Using these microarrays, we comprehensively determined the binding specificities over a full range of affinities of several TFs of diverse structural classes from yeast, worm, mouse, and human. Our universal PBMs permit the discovery of subtle preferences in transcription factor binding sites (including interdependencies among different positions) and can be used with transcription factors from any species regardless of the level to which its genome has been characterized.

In collaboration with Prof. Manuel Llinas's group at Princeton University, we recently identified the first known putative transcription factors and their DNA binding specificities in the malarial parasite Plasmodium falciparum (De Silva et al., PNAS, 2008, 105(24):8393-8398). This study opens up for further exploration the Apicomplexan AP2 class of proteins as a previously unrecognized class of putative transcriptional regulators in P. falciparum.

We have also recently been using universal PBMs to study known and putative prokaryotic transcriptional regulators.


Analysis of the DNA binding specificities of mouse transcription factors

Microarray gene expression data for nearly 40,000 known and predicted mRNAs has been obtained for 55 mouse tissues. Analysis of these data has shown that similarly expressed genes have similar functional properties (Zhang et al., J. Biol., 2005). In a collaboration with Tim Hughes' laboratory, we are using our universal PBMs to determine the DNA binding specificities of hundreds of known and predicted mouse TFs that exhibit tissue-specific expression patterns. Our eventual goal is to understand how these TFs regulate their target genes through action at cis regulatory modules (CRMs). From analysis of the TFs' DNA binding specificities, cross-species conservation, gene expression data, and binding site clustering, we hope to understand what combinations of TFs co-regulate groups of target genes. The results will generate high-confidence predictions of co-regulatory mechanisms and the locations of candidate tissue-specific cis regulatory modules in the mouse genome.

Sequence specificity of homeodomains

The ~60 amino acid homeobox domain or `homeodomain' is a conserved DNA-binding protein domain best known for its role in transcription regulation during vertebrate development. The homeodomain binds DNA predominantly through interactions between helix 3 (recognition helix) and the major groove. Since many homeodomains have similar DNA sequence preferences, much attention has been paid to the role of protein-protein interactions in target definition, despite evidence that the sequence specificity of monomers contributes to targeting specificity and that binding sequences do vary, particularly among different subtypes. It has been proposed that the DNA binding specificity of homeodomains is determined by a combinatorial molecular code among the DNA-contacting residues.

The mouse homeodomain complement, estimated at 260 distinct proteins and 275 individual homeodomains, is broadly conserved across animals. We used universal protein binding microarrays (PBMs) containing 41,944 60-mer probes in which all possible 10-base sequences are represented, to derive DNA binding specificity profiles for 168 mouse homeodomains. Our analysis showed that most homeodomains have distinctive sequence preferences, which may contribute to the strong selective pressure on their amino acid sequences as well as the biological specificity in target genes and diversity in function among the homeodomain proteins.

In addition to base-specific contacts made by positions 47, 50, and 54, which are believed to be the main determinants of differences in binding specificity, and residues in the N-terminal arm, we found additional recognition positions that are predictive of the differences in DNA binding specificity that we observed for related homeodomains proteins.



This page was last updated September 1, 2008